Automatic recognition of German news focusing on future-directed beliefs and intentions
نویسندگان
چکیده
We consider the classification of German news stories as either focusing on future-directed beliefs and intentions or lacking these. The method proposed in this article requires only a small set of labeled training data. Rather, we introduce German clues for the automatic identification of future-orientation which are used for automatic labeling of Reuters news stories. We describe the development of a high-precision procedure for automatic labeling in a bootstrapping fashion: A first version of the labeling procedure uses the absence of clues for future-directedness as indicator for non-future-directedness and is able to automatically label about one third of the Reuters news stories with high precision. Then a perceptron is applied to the automatically labeled news stories in order to semi-automatically acquire an additional set of clues for nonfuture-directedness. The second version of the labeling procedure additionally uses these clues and achieves remarkably improved results in terms of recall; it can even be extended by a guessing step to perform classification with an error of 22.5%. We also investigate another way to increase the recall by using the automatically labeled news stories as training data for statistical classifiers. Three different types of statistical classifiers are applied in order to address the question, which classifier is most suited for the text classification task considered. The best statistical classifier combined with the results of improved automatic labeling is able to recognize the two classes of news stories with an error of 19%.
منابع مشابه
Automatic speech recognition and translation of a Swiss German dialect: Walliserdeutsch
Walliserdeutsch is a Swiss German dialect spoken in the south west of Switzerland. To investigate the potential of automatic speech processing of Walliserdeutsch, a small database was collected based mainly on broadcast news from a local radio station. Experiments suggest that automatic speech recognition is feasible: use of another (Swiss German) database shows that the small data size lends i...
متن کاملThe 300k LIMSI German broadcast news transcription system
This paper describes improvements to the existing LIMSI German broadcast news transcription system, especially its extension from a 65k vocabulary to 300k words. Automatic speech recognition for German is more problematic than for a language such as English in that the inflectional morphology of German and its highly generative process of compounding lead to many more out of vocabulary words fo...
متن کاملA comparative study of HMM-based approaches for the automatic recognition of perceptually relevant aspects of spontaneous German speech melody
Three approaches to the speaker independent automatic recognition of melodic aspects of spontaneous German are presented. All systems are based on Hidden Markov Models. Their input is restricted to the speech signal from which a feature extraction component derives eleven prosodic features. No additional information { as commonly used for prosody recognition { like word chains, word hypotheses,...
متن کاملHmm-based Classification of Glottalization Phenomena in German-accented English
The present paper investigates the automatic detection of word-initial glottalization phenomena (glottal stops and creaky voice) in German-accented English by means of HMMs. Glottalization of word-initial vowels can be very frequent in German-accented English, as well as in German. Detection and classification of glottalization phenomena is useful in order to obtain a pre-segmentation of speech...
متن کاملAn Algorithm for Plan Recognition in Collaborative Discourse
A model of plan recognition in discourse must be based on intended recognition, distinguish each agent's beliefs and intentions from the other's, and avoid assumptions about the correctness or completeness of the agents' beliefs. In this paper, we present an algorithm for plan recognition that is based on the Shared-Plan model of collaboration (Grosz and Sidner, 1990; Lochbaum et al., 1990) and...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
- Computer Speech & Language
دوره 22 شماره
صفحات -
تاریخ انتشار 2008